97 research outputs found
D-TrAttUnet: Dual-Decoder Transformer-Based Attention Unet Architecture for Binary and Multi-classes Covid-19 Infection Segmentation
In the last three years, the world has been facing a global crisis caused by
Covid-19 pandemic. Medical imaging has been playing a crucial role in the
fighting against this disease and saving the human lives. Indeed, CT-scans has
proved their efficiency in diagnosing, detecting, and following-up the Covid-19
infection. In this paper, we propose a new Transformer-CNN based approach for
Covid-19 infection segmentation from the CT slices. The proposed D-TrAttUnet
architecture has an Encoder-Decoder structure, where compound Transformer-CNN
encoder and Dual-Decoders are proposed. The Transformer-CNN encoder is built
using Transformer layers, UpResBlocks, ResBlocks and max-pooling layers. The
Dual-Decoder consists of two identical CNN decoders with attention gates. The
two decoders are used to segment the infection and the lung regions
simultaneously and the losses of the two tasks are joined. The proposed
D-TrAttUnet architecture is evaluated for both Binary and Multi-classes
Covid-19 infection segmentation. The experimental results prove the efficiency
of the proposed approach to deal with the complexity of Covid-19 segmentation
task from limited data. Furthermore, D-TrAttUnet architecture outperforms three
baseline CNN segmentation architectures (Unet, AttUnet and Unet++) and three
state-of-the-art architectures (AnamNet, SCOATNet and CopleNet), in both Binary
and Mutli-classes segmentation tasks
A study on different experimental configurations for age, race, and gender estimation problems
This paper presents a detailed study about different algorithmic configurations for estimating soft biometric traits. In particular, a recently introduced common framework is the starting point of the study: it includes an initial facial detection, the subsequent facial traits description, the data reduction step, and the final classification step. The algorithmic configurations are featured by different descriptors and different strategies to build the training dataset and to scale the data in input to the classifier. Experimental proofs have been carried out on both publicly available datasets and image sequences specifically acquired in order to evaluate the performance even under real-world conditions, i.e., in the presence of scaling and rotation
CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression
In recent years, estimating beauty of faces has attracted growing interest in the fields of computer vision and machine
learning. This is due to the emergence of face beauty datasets (such as SCUT-FBP, SCUT-FBP5500 and KDEF-PT) and
the prevalence of deep learning methods in many tasks. The goal of this work is to leverage the advances in Deep
Learning architectures to provide stable and accurate face beauty estimation from static face images. To this end, our
proposed approach has three main contributions. To deal with the complicated high-level features associated with the FBP
problem by using more than one pre-trained Convolutional Neural Network (CNN) model, we propose an architecture with
two backbones (2B-IncRex). In addition to 2B-IncRex, we introduce a parabolic dynamic law to control the behavior
of the robust loss parameters during training. These robust losses are ParamSmoothL1, Huber, and Tukey. As a third
contribution, we propose an ensemble regression based on five regressors, namely Resnext-50, Inception-v3 and three
regressors based on our proposed 2B-IncRex architecture. These models are trained with the following dynamic loss
functions: Dynamic ParamSmoothL1, Dynamic Tukey, Dynamic ParamSmoothL1, Dynamic Huber, and Dynamic Tukey,
respectively. To evaluate the performance of our approach, we used two datasets: SCUT-FBP5500 and KDEF-PT. The
dataset SCUT-FBP5500 contains two evaluation scenarios provided by the database developers: 60-40% split and five-
fold cross-validation. Our approach outperforms state-of-the-art methods on several metrics in both evaluation scenarios of
SCUT-FBP5500. Moreover, experiments on the KDEF-PT dataset demonstrate the efficiency of our approach for estimating
facial beauty using transfer learning, despite the presence of facial expressions and limited data. These comparisons highlight
the effectiveness of the proposed solutions for FBP. They also show that the proposed Dynamic robust losses lead to more
flexible and accurate estimators.Open Access funding provided thanks to the CRUE-CSIC
agreement with Springer Nature
When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking
The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature as gaze tracking. This has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have revolutionized the whole machine learning area, and gaze tracking as well. In this arena, it is being increasingly useful to find guidance through survey/review articles collecting most relevant works and putting clear pros and cons of existing techniques, also by introducing a precise taxonomy. This kind of manuscripts allows researchers and technicians to choose the better way to move towards their application or scientific goals. In the literature, there exist holistic and specifically technological survey documents (even if not updated), but, unfortunately, there is not an overview discussing how the great advancements in computer vision have impacted gaze tracking. Thus, this work represents an attempt to fill this gap, also introducing a wider point of view that brings to a new taxonomy (extending the consolidated ones) by considering gaze tracking as a more exhaustive task that aims at estimating gaze target from different perspectives: from the eye of the beholder (first-person view), from an external camera framing the beholder’s, from a third-person view looking at the scene where the beholder is placed in, and from an external view independent from the beholder
Microplastic Identification via Holographic Imaging and Machine Learning
Microplastics (MPs) are a major environmental concern due to their possible impact on water pollution, wildlife, and the food chain. Reliable, rapid, and high‐throughput screening of MPs from other components of a water sample after sieving and/or digestion is still a highly desirable goal to avoid cumbersome visual analysis by expert users under the optical microscope. Here, a new approach is presented that combines 3D coherent imaging with machine learning (ML) to achieve accurate and automatic detection of MPs in filtered water samples in a wide range at microscale. The water pretreatment process eliminates sediments and aggregates that fall out of the analyzed range. However, it is still necessary to clearly distinguish MPs from marine microalgae. Here, it is shown that, by defining a novel set of distinctive "holographic features," it is possible to accurately identify MPs within the defined analysis range. The process is specifically tailored for characterizing the MPs' "holographic signatures," thus boosting the classification performance and reaching accuracy higher than 99% in classifying thousands of items. The ML approach in conjunction with holographic coherent imaging is able to identify MPs independently from their morphology, size, and different types of plastic materials
automatic joint attention detection during interaction with a humanoid robot
Joint attention is an early-developing social-communicative skill in which two people (usually a young child and an adult) share attention with regards to an interesting object or event, by means of gestures and gaze, and its presence is a key element in evaluating the therapy in the case of autism spectrum disorders. In this work, a novel automatic system able to detect joint attention by using completely non-intrusive depth camera installed on the room ceiling is presented. In particular, in a scenario where a humanoid-robot, a therapist (or a parent) and a child are interacting, the system can detect the social interaction between them. Specifically, a depth camera mounted on the top of a room is employed to detect, first of all, the arising event to be monitored (performed by an humanoid robot) and, subsequently, to detect the eventual joint attention mechanism analyzing the orientation of the head. The system operates in real-time, providing to the therapist a completely non-intrusive instrument to help him to evaluate the quality and the precise modalities of this predominant feature during the therapy session
- …